# An Optimized Design System for Flip-Flop Grouping Using Low Power Clock Gating

Dr. D. Mahesh Kumar

Assistant Professor in Electronics, PSG College of Arts & Science, Coimbatore - 14, Tamil Nadu, India.

Abstract - Power optimization plays the important role in the recent years. For reducing dynamic power dissipation data driven clock gating is a popular technique which is used in many synchronous circuits. Dynamic power management (DPM) is a design methodology for dynamically reconfiguring systems to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components. Gating is a circuit which can be manually inserted into the register transfer level (RTL) design. In a sequential circuit when a logic unit is clock, its underlying sequential elements will receive the clock signal regardless of whether or not they will toggle in the next cycle. These flipflops are grouped together so that they share a common clock enabling signal which will reduce the hardware overhead. The group size will lead to maximize the power savings. We present a high-speed wide-range of parallel counter that achieves the high operating frequencies through a novel pipeline partitioning methodology using only three simple repeated CMOS-logic. The look ahead clock gating is integrated into an Electronic Design Automation with commercial backend design flow, achieving total power reduction of various types of large-scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technologies. The state look-ahead path prepares the counting path's next counter state prior to the clock edge such that the clock edge triggers all modules simultaneously, thus concurrently updating the count state with a uniform delay at all counting path modules/stages with respect to the clock edge.

Index Terms – Clock gating, Clock networks, Dynamic power Reduction, Multiple bit flip flop.

#### 1. INTRODUCTION

With power/thermal budgets for modern system on chips (SoCs) are growing more which integrate an increasing number of transistors; power minimization has become one of the most important objectives in designing SoCs for various applications. High power dissipation of a SoC will not only increase its system costs but also affect the product lifetime and reliability. To optimize the power consumption, many lowpower design techniques have been introduced [2], such as clock gating [3], [4], replacing non-timing-critical cells with their high-Vt counter parts [5], [6], power gating [7], [8], creating multi-supply-voltage designs [5]. dynamic voltage/frequency scaling [9], [10] and minimizing the clock network.

Modern digital systems are designed with a target clock period, which determines the rate of data processing. A clock network distributes the clock signal from the clock generator or source to the clock inputs or sinks of the synchronizing components or modules. The clock distribution network consumes large percentage of the power consumed by these systems. Therefore, in low-power synchronous systems, we would like to minimize the total power consumed by the clock tree subject to the performance constraints on the clock signal, such as the operating frequency and maximum clock skew.



Fig. 1 Gated clock tree, with synchronizing elements

The power consumed by complementary metal oxide semiconductor (CMOS) circuits consists of two components: dynamic and static power consumption. The static power is largely determined by the technology. In this paper, we only consider on minimizing the dynamic power. In a normal clock tree, the clock signal arrives regularly at all of the clock sinks, which means  $\alpha = 1$ . Suppose that we know the times at which the clock sinks must be active and should be active/idle times for the module as activity patterns. They can be obtained by the simulation of the design at the behavioural level itself. In that one, the clock signal must be supplied to the modules only during their active times. If the clock signal is gated such that it is only delivered during the active times we can reduce the total power consumed by the clock and by the modules themselves. From this method we can see a clock tree thus constructed an activity-driven clock tree. In this paper, once again we address the problem of minimizing the power consumption of a synchronous system by minimizing its activity through the use of an activity-driven clock tree. Fig. 1 shows an example of gated clock tree.

In this paper we focus on the data-driven clock gating and look ahead clock gating which can be used for flip flops at the gate level. The clock signal driving a flip flop is disabled when the flip flops state is not changing in the next clock cycle [11]. Data driven clock gating is causing more area and power overheads that must also to be considered while designing a circuit. To reduce the area and power overhead, it is proposed to group the several flip flops to be driven by the same clock signal. generated by connecting the enabling signals of the individual flip flops. However, this may lead to lower the disabling effectiveness. Therefore grouping the flip flops which cause switching activities are highly correlated and derive a joint enabling signal. In a recent paper, a model for data-driven gating is developed based on the toggling activity of the constituent flip flops [13]. The optimal fan out of a clock gate yielding maximal power savings is derived based on the average toggling statistics of the individual flip flops, process technology and cell library in use. In any digital systems the state transitions of flip flops are depends on the data they process.

In the next section, we overview the various clock gating techniques. Section III overviews about the data-driven clock gating. Section IV describes about the look ahead clock gating which motivates this paper. Section V discusses the implementation of a practical design flow. Section VI presents experimental results. Final conclusions are presented in Section VII.

# 2. VARIOUS CLOCK GATING TECHNIQUES

## AND CLOCK GATING:

In sequential circuit one two-input AND gate is inserted in logic for clock gating. One input to AND gate is clock while the second input is a signal used to control the output. Clock gating technique for the counter by inserting one AND gate. When counter is negative edge triggered and enable changes from a clock cycle starting from the negative edge to the next negative edge.

## NOR CLOCK GATING:

NOR gate is a suitable technique for clock gating where we need to be performed on positive edge of the global clock. For analyzing using NOR gate, the counter will work when enable turn ON. When enable changes to "1" counter output is negative edge of the clock and small glitches will be occurred.

# SYNTHESIS BASED CLOCK GATING:

Synthesis-based clock gating is the most widely used method by EDA tools. The utilization of the clock pulses, measured by data-to-clock toggling ratio, left after the employment of synthesis-based gating may still be very low. In this method, the average data-to-clock toggling ratio, obtained by extensive power simulations of 61 blocks comprising 200k FFs, taken from a 32nm high-end 64-bit microprocessor. Those are mostly control blocks of the data-path, register-file and memory management units of the processor. The technology parameters used throughout the papers are of 22nm lowleakage process technology. Their clock enabling signals were derived by a mix of logic synthesis and manual definitions. The clock capacitive load is 70% of their total load. The blocks are increasingly ordered by their data-to-clock activity ratio. It is clearly shown that the data toggles in a very low rate compared to the gated clocks.

## 3. DATA-DRIVEN CLOCK GATING TECHNIQUE

Clock enabling signals are very well understood at the system level and thus can effectively be defined. These clock signals will also capture the periods of the functional blocks and modules that do not need to be clocked. These signals and blocks are later being automatically synthesized into clock enabling signals at the gate level. As a part of a design methodology clock enabling signals are manually added for every flip flop. But when modules at a high and gate level are clocked, the state transitions of their underlying flip flops depend on the data being processed. It is important to note that the entire dynamic power consumed by a system produce from the periods where the modules of the clock signals are enabled. Therefore, regardless of how small this clock period is, it will assess the effectiveness of the clock gating requires extensive simulations and statistical analysis of flip flops toggling activity.



Fig. 2 Practical data-driven clock gating

A flip flop will find out that its clock signal can be disabled in the next/forthcoming cycle by XORing its output with the present data input that will appear at its output in the next cycle. The outputs of k-XOR gates are ORed to generate a joint gating signal for k-flip flops, which is then latched to avoid glitches. The combination of a latch with AND gate is commonly used by commercial tools and is called integrated clock gate (ICG). This type of data driven gating is used for a digital filter in an ultralow-power design. There is a clear trade off between the numbers of saved clock, pulses and the hardware overhead. With an increase in k, the hardware overhead decreases but so does the probability of disabling, obtained by ORing the k enable signals. Let the average toggling probability of a flip flop be denoted by p (0 ). Such a gating scheme has considerable timing implications, which are discussed in [14].

### 4. LOOK AHEAD CLOCK GATING TECHNIQUE

Early design methodologies improved counter operating frequency by partitioning the large counters into multiple smaller counting modules. Such modules have higher significance that was enabled when all bits in all modules of lower significance saturate. Initializations and propagation delays such as register load time, AND logic chain decoding and the half incrementer component delays in half adders dictated operating frequency. Subsequent methodologies improved counter operating frequency using half adders in the parallel counting modules that enabled carry signals generated at counting modules of lower significance to serve as the count enable for counting modules of higher significance, essentially implementing a carry chain from modules of lower significance to modules of higher significance. The carry chain cascaded synchronously through intermediate D-type flip-flops (DFFs). The maximum operating frequency was limited by the half adder module delay, DFF access time, and the detector logic delay. Since the module outputs did not directly represent count state, the detector logic further decoded the module outputs to the outputted count state value.

Look-ahead clock gating has been shown to be very useful in reducing the clock switching power. The computation of the Clock enabling signals one cycle ahead of time avoids the tight timing constraints existing in other gating methods. A closed form model characterizing the power saving was presented and used in the implementation of the gating logic. The gating logic can be further optimized by matching target FFs for joint gating which may significantly reduce the hardware overheads.



Fig. 3 Look Ahead clock gating

While this technique discussed the case of merging two target FFs for joint gating, clustering target FFs in larger groups may yield higher power savings. We could drive several FFs with a common gater if we knew that they are toggling simultaneously most of the time, thus achieving almost the same power reduction, but with fewer gaters. The grouping may place up to several dozens of FFs in a single group, and is usually done by

synthesizers during the physical design phase. Such tools are focusing on skew, power, and area minimization, and are not aware of the toggling correlations of the underlying FFs.

The proposed look ahead clock gating technique, here in existing data driven clock gating technique, the transmission gate is replaced by the NMOS pass transistor, because in transmission gate no threshold loss but in pass transistor threshold loss will be there ,but pass transistor is followed by inverter means it will produce exact logic without any threshold loss. In order to reduce power and less area we replaced the transmission gate in to pass transistor. When compare to the existing DDCG the proposed LACG consumes less power and less area.

### 5. IMPLEMENTATION AND SIMULATION RESULTS

The design flow described in Section III is experimented on a DSP core comprising 22k FFs, another large vectored DSP core comprising 100k FFs, a 3-D graphics accelerator [9] and a network processor control block. The resulting power for a wide range of group sizes, where the maximum power savings are achieved. The results obtained from the LACG technique include not only the clock network and the sequential power but also the power consumed in the combinational logic, which is about half of the total dynamic power. Hence, considering clocking power savings alone, 15%-20% is achieved. Since the toggling probability is averaged across the entire FFs, it may happen that different sub-blocks will have different probabilities. It is therefore possible to further reduce the power by using various k values in different sub blocks. Another interesting observation is the slight growth in the combinational logic power, due to the extra XOR connected at every flip flop and the other logic involved in look ahead clock gating.



Fig. 3 Simulation Output of Data driven Clock gating

The results of the combined synthesis-based and data-based gating scheme are worse than the look ahead clock only gating for all the circuits. Thus, unless register files can undergo only synthesis-based gating and data-based gating will not be applied to them, synthesis-based gating should be completely replaced by data-based gating. As mentioned earlier, the gating scheme may have considerable timing implications.



Fig. 4 Simulation Output of Look ahead Clock gating



#### Fig. 5 Power Analysis

#### 6. CONCLUSION

In this paper, we have proposed a novel approach for the construction of activity-driven clock trees with the objective of minimizing power consumption. We have developed algorithms that solve the problems of clock tree construction and gate insertion into the clock tree while minimizing power consumption and producing small clock skew.

In a power-managed system, the state of operation of various components is dynamically adapted to the required performance level, in an effort to minimize the power wasted by idle or underutilized components. For most system components, state transitions have non-negligible power and performance costs. Thus, the problem of designing power management policies that minimize power under performance constraints is a challenging one. We surveyed several classes of power-managed systems and power management policies. Furthermore, we analyzed the tradeoffs involved in designing and implementing power -managed systems. Several practical examples of power-managed systems were analyzed and discussed in detail. Look-ahead clock gating has been shown to be very useful in reducing the clock switching power. The computation of the clock enabling signals one cycle ahead of time avoids the tight timing constraints existing in other gating methods. A closed form model characterizing the power saving was presented and used in the implementation of the gating logic. The gating logic can be further optimized by matching target FFs for joint gating which may significantly reduce the hardware overheads. While this paper discussed the case of merging two target FFs for joint gating, clustering target FFs in larger groups may yield higher power savings. This is a matter of a further research.

The technique used is referred to as sequential look ahead. Additional state bits are added (in addition to those used to give the usually binary code which is the output of the counter) which are defined to represent a useful logic function of the original counter state bits (such as the logical AND of a number of bits). Using the terminology of finite state machines, the next state equations are then re-expressed using this additional bit. Given a suitable choice for the additional state bit (to be referred to as the 'look ahead' bit), the next state equations will then be simplified, so are likely to permit a faster implementation. The look ahead bit itself must not of course need excessive computation time.

#### REFERENCES

- V.G. Oklobdzija, Digital System Clocking –High-Performance and Low-Power Aspects. New York, NY, USA: Wiley, 2003.
- [2] R. Goering, "Low-power IC design techniques may perturb the entire flow," EE Times, May 7, 2007.
- [3] Q. Wu, M. Pedram, and X. Wu, "Clock-gating and its application to low power design of sequential circuits," IEEE Trans. Circuits Syst. I, vol. 47, no. 3, pp. 415–420, Mar. 2000.
- [4] H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy, "Ultra low power clocking scheme using energy recovery and clock gating, "IEEE Trans. Very Large Scale Integr. Syst., vol. 17, no. 1, pp. 33–44, Jan. 2009.
- [5] A. Khan, P. Watson, G. Kuo, D. Le, T. Nguyen, S. Yang, P. Bennett, P. Huang, J. Gill, C. Hawkins, J. Good enough, D. Wang, I. Ahmed, P. Tran, H. Mak, O. Kim, F. Martin, Y. Fan, D. Ge, J. Kung, and V. Shek, "A 90 nm power optimization methodology with application to the ARM 1136JF-S microprocessor, "IEEE J. Solid-State Circuits, vol. 41, no. 8, pp. 1707–1717, Aug. 2006.
- [6] T. Luo, D. Newmark, and D. Z. Pan, "Total power optimization combining placement, sizing and multi-Vt through slack distribution management," in Proc. IEEE/ACM Asia South Pacific Des. Autom. Conf., Mar. 2008, pp. 352–357.
- [7] D.-S. Chiou, S.-H. Chen, S.-C. Chang, and C. Yeh, "Timing driven power gating," in Proc. ACM/IEEE Des. Autom. Conf., Sep. 2006, pp. 121–124.
- [8] H. Xu, R. Vemuri, and W. Jone, "Dynamic characteristics of power gating during mode transition, "IEEE Trans. Very Large Scale Integr. Syst., vol. 19, no. 2, pp. 237–249, Feb. 2011.
- [9] G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi, and S. Dropsho, "Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor," in Proc. ACM Int. Symp. Comput. Architecture, 2003, pp. 14–27.
- [10] L. Yan, J. Luo, and N. Jha, "Combined dynamic voltage scaling and adaptive body biasing for heterogeneous distributed real-time embedded systems," in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2003, pp. 30–37.
- [11] M. Donno, E. Macii, and L. Mazzoni, "Power-aware clock tree planning," in Proc. Int. Symp. Phys. Design, 2004, pp. 138–147.
- [12] SpyGlass Power [Online]. Available:http://www.atrenta.com/solutions/spyglass-family/spyglasspower.htm
- [13] S. Wimer and I. Koren, "The Optimal fan-out of clock network for power minimization by adaptive gating," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1772–1780, Oct. 2012.

- [14] S. Wimer and I. Koren, "The Optimal fan-out of clock network for power minimization by adaptive gating," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1772–1780, Oct. 2012.
- [15] L.Benini, A. Bogliolo, and G. De Micheli, "A survey on design techniques for system-level dynamic power management," IEEE Trans.VLSI Syst., vol. 8, no. 3, pp. 299–316, June 2000.
- [16] M.S. Hosny and W. Yuejian, "Low power clocking strategies in deep submicron technologies," in Proc. IEEE Int. Conf. Integr. Circuit Design Technol., ICICDT 2008, pp. 143–146.
- [17] C.Chunhong, K. Changjun, and S. Majid, "Activity-sensitive clock tree construction for low power," in Proc. ISLPED, 2002, pp. 279–282.
- [18] A.Farrahi, C. Chen, A. Srivastava, G. Tellez, and M. Sarrafzadeh, "Activity-driven clock design," IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 20, no. 6, pp. 705–714, Jun. 2001.

Author



**D. Mahesh Kumar** obtained his B.Sc., Electronics and M.Sc., Applied Electronics from PSG College of Arts and Science, Coimbatore in 1996 and 1998 and also M.Phil., in Electronics from PSG College of Arts and Science, Coimbatore in 2006. With that completed Ph.D in Electronics in March 2018. He has been working in the teaching field for about 16 years. His area of interest

includes VLSI Design, Wireless Communication and Embedded System. He has published many articles in the reputed national and international journals and also one book on the topic "Textbook of Operational Amplifier and Linear Integrated Circuits" by Macmillan India Ltd., New Delhi.